Bayesian Model Averaging Naive Bayes (BMA-NB): Averaging over an Exponential Number of Feature Models in Linear Time
نویسندگان
چکیده
Naive Bayes (NB) is well-known to be a simple but effective classifier, especially when combined with feature selection. Unfortunately, feature selection methods are often greedy and thus cannot guarantee an optimal feature set is selected. An alternative to feature selection is to use Bayesian model averaging (BMA), which computes a weighted average over multiple predictors; when the different predictor models correspond to different feature sets, BMA has the advantage over feature selection that its predictions tend to have lower variance on average in comparison to any single model. In this paper, we show for the first time that it is possible to exactly evaluate BMA over the exponentiallysized powerset of NB feature models in linear-time in the number of features; this yields an algorithm about as expensive to train as a single NB model with all features, but yet provably converges to the globally optimal feature subset in the asymptotic limit of data. We evaluate this novel BMA-NB classifier on a range of datasets showing that it never underperforms NB (as expected) and sometimes offers performance competitive (or superior) to classifiers such as SVMs and logistic regression while taking a fraction of the time to train.
منابع مشابه
Bayesian Model Averaging for Improving Performance of the Naïve Bayes Classifier
Feature selection has proved to be an effective way to reduce the model complexity while giving a relatively desirable accuracy, especially, when data is scarce or the acquisition of some feature is expensive. However, the single selected model may not always generalize well for unseen test data whereas other models may perform better. Bayesian Model Averaging (BMA) is a widely used approach to...
متن کاملPredicting waste generation using Bayesian model averaging
A prognosis model has been developed for solid waste generation from households in Hoi An City, a famous tourist city in Viet Nam. Waste sampling, followed by a questionnaire survey, was carried out to gather data. The Bayesian model average method was used to identify factors significantly associated with waste generation. Multivariate linear regression analysis was then applied to evaluate th...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملThe application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data
OBJECTIVE Predicting patient outcomes from genome-wide measurements holds significant promise for improving clinical care. The large number of measurements (eg, single nucleotide polymorphisms (SNPs)), however, makes this task computationally challenging. This paper evaluates the performance of an algorithm that predicts patient outcomes from genome-wide data by efficiently model averaging over...
متن کاملModel Averaging for Prediction with Discrete Bayesian Networks
In this paper1 we consider the problem of performing Bayesian model-averaging over a class of discrete Bayesian network structures consistent with a partial ordering and with bounded in-degree k. We show that for N nodes this class contains in the worst-case at least Ω( (N/2 k )N/2 ) distinct network structures, and yet model averaging over these structures can be performed using O( (N k ) ·N) ...
متن کامل